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*  / 

A  target  is  located  in  one  of  n  boxes.  Initially, 
the  target  is  in  box  i  with  a  given  prior  probability 

J  p°  =  1  .  A  sequential  search  is  made.  Searching  box 
i  costs  c£  >  0  and  finds  the  target  with  probability 
(i.e.,  the  overlook  probability  is  1  -  o^)  if  the  target 
is  in  the  box  at  that  time.  A  reward  is  earned  if  the 

target  is  found  in  box  i  .  A  strategy  is  any  rule  for 
determining  when  to  search,  and  if  so,  which  box.  The 
objective  is  to  maximize  the  probability  of  finding  the 
target  in  a  given  number  of  searches  or  to  minimize  the 
risk  (expected  searching  cost  minus  expected  reward). 

In  the  above  model,  suppose  n  =  2  and  the  objective 
is  to  minimize  the  risk.  Consider  the  optimal  strategy  as 
a  function  of  the  state  (defined  as  the  posterior  probability 
vector).  Let  Sq  be  the  set  of  states  for  which  an  optimal 
strategy  stops  searching.  Let  be  the  set  of  states  for 

which  an  optimal  strategy  searches  box  i  ,  i  =  1,2  .  A 
counterexample  shows  that  although  Sq  is  a  convex  set, 
surprisingly,  need  not  be  convex. 

A  moving  target  model  is  studied  in  which  a  target  is 
assumed  to  move  from  box  to  box  in  accordance  with  a  Markov 
transition  probability  matrix.  Conditions  are  given  so  that 
the  optimal  strategy  can  be  characterized  for  a  general  n 
box  model. 

In  an  optimal  search  model  with  random  overlook 
probabilities,  the  a/s  are  allowed  to  be  random  variables. 

For  instance,  the  a's  may  be  random  due  to  weather 


condition.  Let  a*  be  the  a’s  at  the  t-th  stage  told 

after  the  t-th  search.  For  fixed  i  ,  it  is  assumed  that 
1  2 

ai*ai*  ***  are  independent  identically  distributed  random 
variables.  The  following  results  are  derived.  To  maximize 
the  probability  of  finding  a  target  in  a  given  number  of 
searches,  an  optimal  strategy  searches  at  each  time  a  box 
with  max  p^Ec^  •  To  minimize  the  expected  searching  cost 
before  finding  the  target,  an  optimal  strategy  searches  at 

PiE°i 

each  time  a  box  with  max  - - —  .  Although  these  results 

Ci 

resemble  the  classic  results  for  a  model  with  deterministic 


a^'s  ,  the  proofs  are  entirely  new. 
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CHAPTER  1 


INTRODUCTION 


1.1  Introduction  of  the  Model 

Optimal  search  models  have  been  of  theoretical  interest  as  well 
as  practical  importance.  In  practical  application,  the  most  frequently 
encountered  problem  of  this  type  would  be  the  optimal  search  of  a 
target.  The  target  may  be  in  any  one  of  m  regions,  which  is  the 
same  as  saying  a  ball  may  be  in  any  one  of  the  m  boxes  as  treated  in 
this  thesis.  An  optimal  decision  is  desired  as  to  which  region  to  search 
in  order  to  find  or  hit  the  target.  Prior  to  the  search,  it  is  assumed 
that  the  probability  distribution  of  the  location  of  the  target  is 
known.  Suppose  further  that,  due  to  technical  errors  or  other  reasons, 
one  might  miss  the  target  even  when  the  correct  location  is  searched. 

Thus  after  the  search,  if  one  misses  the  target,  some  information  is 
gained  and  used  for  the  next  search.  The  problem  is  to  find  an  optimal 
sequence  of  searches  in  order  to  maximize  the  probability  of  finding  the 
ball  in  a  finite  number  of  searches  or  to  minimize  the  expected  searching 
cost  before  finding  the  target. 

One  can  easily  think  of  some  possible  complications  of  the  above 
problem.  For  example,  the  target  may  be  moving;  the  overlook  probability 
may  be  random  due  to  weather  condition,  etc.  These  are  the  various 
aspects  of  the  problem  which  will  be  investigated  in  this  thesis.  A 
more  precise  mathematical  model  will  be  given  later. 

1.2  Background 

The  problem  of  optimal  search  models  has  been  studied  by  many 
authors.  Among  them  are  blackwell,  Chew,  Ross,  Kadane,  Pollock,  etc. 
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A  simple  optimal  search  model  is  as  follows.  Suppose  a  ball 

is  in  one  of  m  boxes.  Initially,  the  ball  is  known  to  be  in  box 

m 

i  with  probability  P°  ,  £  P°  ■  1  .  A  sequential  search  is  made. 

i-1  1 

Searching  box  i  incurs  a  cost  >  0  .  The  probability  of  finding 
the  ball  is  (i.e.,  the  overlook  probability  is  1  -  o^)  if  a  search 

is  made  in  box  i  ,  given  that  the  ball  is  in  that  box.  After  a  search 
if  the  ball  is  found  then  the  searching  process  terminates.  If  the 
ball  is  not  found,  then  the  searching  process  continues.  The  objective 
is  to  minimize  the  expected  cost  before  finding  the  ball. 

Blackwell  (1962)  characterized  an  optimal  strategy  for  the  above 
model.  He  showed  that  an  optimal  strategy  is  to  search  at  any  time 

Vi 

that  box  with  max  -  ,  p.  being  the  posterior  probability  of  the 

l 

ball  being  in  box  i  at  that  time. 

Chew  (1967)  considered  the  care  of  equal  costs  and  introduced 
the  option  of  stopping  at  a  penalty.  He  required  at  least  one  of  the 
os’s  to  be  zero  and  proved  that  an  optimal  stopping  rule  exists.  Some 
of  the  results  he  obtained  are  as  follows: 

1.  An  optimal  strategy  either  stops  or  searches  the  box  with 

max  . 

2.  To  maximize  the  probability  of  finding  the  ball  in  L  searches, 
it  is  optimal  to  search  the  box  with  max  aiPi  • 

Kadane  (1968)  considered  the  problem  of  maximizing  the  probability 
of  finding  the  ball  under  a  budget  ceiling.  He  allowed  the  costs  and 
overlook  probabilities  to  depend  on  the  number  of  searches  made  in  a  box. 
By  applying  the  Neyman-Pearson  Lemma,  he  proved  that,  under  some 
conditions,  it  is  optimal  to  search  the  box  with  maximum  probability 


per  cost. 
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Ross  (1969)  investigated  a  general  optimal  search  and  stop  model. 

He  assumed  that  a  reward  R^  is  gained  if  a  ball  is  found  in  box  i  . 

If  R.  =  R  ,  then  this  is  equivalent  to  a  penalty  for  stopping  without 
finding  the  ball.  He  used  a  general  result  on  negative  dynamic  programming 
to  show  that  an  optimal  strategy  exists.  The  main  results  he  obtained 
are  as  follows. 


1.  The  optimal  risk,  defined  as  the  expected  searching  cost 
minus  the  expected  reward,  is  a  concave  function  of  the  initial 
distribution  T?=  jp?j  .  The  optimal  stopping  region,  defined 
as  the  set  of  T°  at  which  it  is  optimal  to  stop,  is  convex. 

2.  For  the  equal  rewards  but  unequal  costs  case,  he  proved  that 
an  optimal  strategy  either  stops  or  searches  the  box  with 

Vi 

max  -  .  That  is  more  general  than  Chew's  result,  since 

C  • 

1 

costs  are  allowed  to  be  different,  and  no  requirements  on 
assumed. 

3.  For  the  case  where  both  the  rewards  and  the  costs  are  allowed 
to  be  different,  he  proved  that  an  optimal  strategy  either 


searches  the  box  with  max 


or  else  never  searches  that 


box  in  the  sequence  that  follows. 


Pollock  (1970)  introduced  the  optimal  search  model  of  a  moving 
target.  He  assumed  that  the  target  moves  from  box  i  to  box  j  with 
probability  p„  after  every  search.  Otherwise,  the  model  is  the  same 
as  previous  ones.  He  took  the  model  with  two  boxes  and  characterized  the 
optimal  strategy  for  the  perfect  detection  case  (a_^  =  1)  and  the  no 
information  case  (i.e.,  the  matrix  p^  has  identical  rows).  For  the 
general  case,  he  incorrectly  proved  that  an  optimal  initial  decision,  as 
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a  function  of  the  initial  distribution,  can  be  represented  as  two 
regions.  This  will  be  discussed  in  detail  in  Chapter  3. 

1.3  Introduction  of  the  Subsequent  Chapters 

The  purpose  of  this  thesis  is  to  consider  various  versions  of 

optimal  search  models.  In  Chapter  2,  an  optimal  search  and  stop  model 

wich  two  boxes  is  considered.  The  rewards  are  assumed  equal.  Ross's 

results  for  this  model  as  mentioned  in  the  preceding  paragraph  have 

shown  that  the  stopping  region  is  convex.  A  natural  question  to 

ask  is  as  follows.  If  ,  i  =  1,2,  ...  is  the  set  of  states  for 

which  it  is  optimal  to  search  box  i  ,  is  it  necessarily  true  that 

be  convex  as  well?  Intuitively,  one  would  say  yes.  However,  the 

counterexample  in  Chapter  2  shows  the  contrary.  It  shows,  in  the  case 

of  two  boxes,  where  the  state  variable  can  be  represented. as  the 
probability  that  the  ball  is  in  box  1,  the  structure  of  an  optimal  policy  is 

p=0  S,  S  S  S-  p=l 

I - i - 1 - L_| - 5 — | - 1 — | 

state  p 

where  neither  of  the  regions  need  be  vacuous .  Hence  need  not  be 

convex. 

In  Chapter  3,  Pollock's  optimal  search  model  of  a  moving  target 
is  undertaken.  However,  results  such  as  for  the  no  information  case 
as  well  as  the  perfect  detection  case  are  generalized  to  n  boxes'  case. 
The  proof  is  quite  different.  Next,  some  results  on  the  model  with  a 
Jordan  matrix  as  transition  probability  matrix  are  derived.  When  a  ' 
stop  option  is  added,  the  following  result  applies.  If,  in  minimizing 


the  expected  number  of  searches,  it  is  optimal  to  search  the  box.  with 
max  >  then  when  one  is  allowed  to  stop,  an  optimal  strategy  either 

stops  or  searches  the  box  with  max  o^p^  ♦  Finally,  a  two  box  optimal 
search  model  with  *  a and  symmetric  transition  probability  matrix 

is  studied  in  detail.  Conditions  are  given  to  assure  that  searching 
the  box  with  larger  p^  is  optimal.  Also  under  some  condition,  the 
optimal  strategy  takes  on  an  alternating  searching  sequence. 

In  Chapter  4,  the  overlook  probabilities  of  an  optimal  search  model 

are  allowed  to  be  random  variables.  Specifically,  let  be  the 

th  12 

a’s  at  the  t  stage.  For  fixed  i  ,  it  is  assumed  that  ••• 

are  independent  identically  distributed  random  variables.  Two  cases 

may  occur.  First,  at  each  stage,  the  random  overlook  probabilities  are 

told  after  the  search.  For  this  case,  the  following  results  are  derived. 

To  maximize  the  probability  of  finding  the  ball  in  m  searches,  an 

£ 

optimal  strategy  searches  the  box  with  max  p^Ect^  •  To  minimize  the 

t 

piE« 

expected  cost,  an  optimal  strategy  searches  the  box  with  max  -  . 

i  Ci 

When  is  deterministic  and  independent  of  t  ,  this  reduces  to  the 

classic  results  due  to  Blackwell  and  Chew.  Secondly,  at  each  stage, 
the  random  overlook  probabilities  are  told  before  the  search.  For 
this  case,  it  was  hoped  that  under  some  restrictions,  an  optimal  strategy 
would  be  similar  to  that  in  the  first  case.  Unfortunately,  this  is  not 
so,  as  demonstrated  by  a  number  of  counterexamples. 
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CHAPTER  2 

A  COUNTEREXAMPLE  FOR  AN  OPTIMAL  SEARCH  AND  STOP  MODEL 

2.1  The  Model 

Consider  the  type  of  optimal  search  and  stop  model  introduced  by 

Ross  [5].  Let  there  be  two  boxes.  Let  p°  be  the  given  prior 

probability  that  a  ball  is  hidden  in  box  i  ,  i  =  1,2,2  p^  =  1  . 

A  search  of  box  i  costs  c.  (c.  >0)  and  finds  the  ball  with 

i  i 

probability  if  the  ball  is  in  that  box.  Assume  that  a  reward  R 

is  earned  if  the  ball  is  discovered.  At  the  beginning  of  each  time 
period  t  =  1,2,  ...  a  searcher  may  decide  to  search  box  1  or  box  2  or 
to  stop  searching.  The  objective  is  to  find  an  optimal  strategy  to 
maximize  the  expected  net  reward  (expected  reward  minus  expected 
searching  cost) . 

Let  the  state  at  any  time  be  characterized  by  p^  ,  i  *  1,2  where 
PA  is  the  posterior  probability  that  the  ball  is  in  box  i  at  a 
certain  time  (or  stage) .  Since  there  are  only  two  boxes  the  state  at 
any  time  can  be  represented  by  a  scalar  p  ,  where  p^,  =  p  ,  p^  a  1  -  p  . 
Then  the  following  results  are  due  to  Ross. 


i) 


At  any  time  t  ,  an  optimal  strategy  either  searches  a  box 

with  max  a.p.  or  else  stops.  In  terms  of  the  state  p  ,  this 
i  1  1 

*  * 

implies  that  there  exists  a  number  p  ,  0  <_  p  <_  1  ,  such 
* 

that  if  p  _>  p  ,  an  optimal  strategy  either  stops  or  else 

* 


searches  box  1;  if  p  <_  p  , 

* 

or  else  searches  box  2.  p 


an  optimal  strategy  either  stops 

*  * 


V 


*2('1  “  P  > 


is  determined  by 
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li)  The  stopping  region  ,  defined  as  the  set  of  states  for 
which  it  is  initially  optimal  to  stop  is  a  convex  region  or 
an  interval  since  p  is  a  scalar. 

Let  the  horizontal  coordinate  be  p  ,  0  _<  p  £  1  .  Let  , 

i  «=  1,2  be  the  set  of  states  for  which  it  is  initially  optimal  to 

search  box  i  .  Then  the  structure  of  the  optimal  policy  is  characterized 

* 

by  SQ  ,  S2  ,  S2  on  p  .  In  fact,  by  i)  and  ii) ,  if  p  e  Sq  ,  then 
there  exists  an  optimal  policy  which  has  at  most  three  regions  as  below. 


This  policy  of  three  regions  is  intuitive.  It  says  that  if  p  , 
the  probability  that  the  ball  is  in  box  1  is  large,  then  box  1  is 
searched;  if  p  is  small,  meaning  that  p2  =  1  -  p  is  large,  then 
box  2  is  searched.  On  the  other  hand,  if  p  is  somewhere  in  the  middle, 
then  stop. 

At  this  point,  one  may  raise  the  question:  Could  it  happen  that 
p  t  Sq  ?  If  so,  then  by  i)  and  ii)  the  structure  of  the  optimal 
strategy  could  be  like 


1 


1 


or 


* 

P 


0 


P 


1 


8 


More  precisely, 

Sq  nonempty  \ 

*  .  f 

P  t  SQ  / 

f  =>  an  optimal  policy  which  has 
1  i  Sq  1  four  regions 

0  t  sj 

Contrary  to  intuition,  the  counterexample  will  show  that  the 
structure  of  four  regions  could  occur.  Basically,  it  says  that  one 
might  want  to  search  box  1  when  p  is  large  and  stop  when  p  is 
slightly  smaller.  Then  when  p  is  still  smaller,  surprisingly,  one 
searches  box  1  again  before  searching  box  2. 

2.2  The  Counterexample 

A  strategy  is  any  sequence  (or  partial  sequence) 

6  «  (6^,  ....  6g)  where  6.^  e  (1,2,  ...,  m}  for  i  ■  1,  ...»  s  and 
s  e  (0,1,2,  ....  «}  .  The  policy  6  instructs  the  searcher  to  search 
box  6^  at  the  ith  stage  and  to  stop  searching  if  the  object  has  not 
been  found  after  the  sth  search,  s  =  0  means  that  the  searcher  stops 
immediately  and  s  =  00  means  that  he  does  not  stop  until  he  finds  the 
ball. 

For  any  strategy  6  and  any  state  p  ,  0  p  <_  1  ,  let 

"  the  expected  net  reward 

(expected  reward  minus  expected 
searching  cost)  incurred  when  p 
is  the  prior  probability  that  the  ball 
is  in  box  1  and  strategy  6  is  employed. 

Let  f(p)  3  sup  f(p,<5)  .  The  following  lemma  will  be  used  in  the 
6 

counterexample. 
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Lemma  2.1: 


Let  6^  ,  6*"  ,  <S2  be  some  searching  strategies  and  6  be  the 


(“jp  ot2a  -  p) ) 

strategy  of  searching  max  < -  ,  - ' 

1,2  (  C1 


until  finding  the  ball. 


2  I 

Then  the  following  conditions  imply  that  the  structure  of  the  optimal 
policy  may  have  four  regions. 


fd.61)  >  0  ,  f (0,6°)  >  0  ,  f(p*,S2)  >  0  ,  f(p*,5*)  <  0  . 


Proof : 


f(l)  >  f(l,6>  >  0  =>  1  i  S, 


f(0)  >  f (0, 6U)  >  0  =>  0  i  S, 


f(p  )  i  f(P  »<$2)  >  0  =>  p  t  S_  . 


Suppose  the  structure  of  the  optimal  policy  has  no  stopping  region, 

* 

i.e.,  the  optimal  strategy  never  stops.  Then  clearly  6  is  optimal 

*  * 

for  all  pc  [0,1]  ,  which  implies  f(p  ,5  )  0  .  Therefore, 

*  * 

f(p  ,  <$  )  <0  implies  that  the  stopping  region  Sq  is  not  vacuous. 

It  follows  from  i)  and  ii)  that  there  exists  an  optimal  policy  which 
has  four  regions 

Q.E.D. 

It  remains  to  find  numerical  values  for  the  parameters  so  that 
the  conditions  in  the  lemma  are  satisfied. 


Let  R  =  6.6  ,  =  3/4  =  1/2  c^  =  1  ,  c2  «  3  , 
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6®  »  keep  on  searching  box  2  until  finding  the  ball. 

5^  *■  keep  on  searching  box  1  until  finding  the  hall. 

2 

6  *  search  box  1  then  box  2  then  stop. 

3  * 

6  *  the  sequence  used  by  following  6  ,  given  that  the  initial 

* 

state  is  p  . 


Let  p^  be  the  posterior  probability  of  the  process  after  ith 

*  * 

stage  given  that  the  initial  state  is  p  and  that  6  is  used.  At 
*  * 

*  °lp  a2(1  "  p  J  * 

p  ,  - * -  .  Hence  6  says  one  may  search  either  box  1 

C1  c2 

or  box  2,  i.e.,  6^  *  1  or  2  .  Suppose  6^  *  1  ,  then 


* 

P 


_2_ 
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a.p^  a2(l  -  p^)  ctjp*  a2(l  ~  P*) 

-  .  -  e  (1  -  a.)  -  :  - 

C1  C2  ^  C1  C2 


1  -  :  1  «>  *  2 


.(2) 


(2), 


a.p  a_(l  -  pw) 


(1  -  o1)  :  (1  -  a2)  =  1/4  :  1/2 


■>  S3  -  2 


V 


<3>  -.11  -  P<3>]  . 

—  :  ■*— - (1  -  V  ,  (1  -  «2)2  -  1 


(3)  * 

*>  6^  «  1  or  2  «>  p  =  p  . 


-  ..  - - 


***** 


XI 


It  follows  that  5  can  be  a  periodic  sequence,  namely 
3 

6  «  122,122,  ...  .  Consequently 

f (P*.  6*)  =  f (p*, 63) 

*  <*1P*R  -  cx  +  (l  -  a1P*)[a2(l  “  P^>R  “  c2] 

+  (l  -  alP*)[l  '  a2(1  ~  P(1))][a2(1  ~  P(2)>r  "  c2] 

+  (l  -  alP*)[l  -  o2(l  -  P(1))][l  -  a2(l  **  P(2))]‘f(p*,63) 

*  * 

Thanks  to  the  recursive  relation,  one  can  compute  f(p  ,6  )  by 
substituting  the  numerical  values  of  the  parameters. 


£  *  OTp  —  91  ft 

f(p  ,5  )  =  ■ ~  ■  -- -  *  3/4  (6.6  -  6.606  ...)  <  0 


f(MX)  =  R  -  ~  *  6.6  -  4/3  >  0 


0  2 

f(0,O  *=  R - —  =  6.6  -  6  >  0 

'  o2 

f(p*.52)  =  alP*R  "  cx  +  (l  “  alp*)[a2^1  _  P^)R  ~  c2] 

+  (l  -  c«xp*)  [l  -  a2(l  -  p(1))j^a2(l  -  p(2>)R  -  cj 
=  ~  (6.6  -  6.583  ...)  >0  . 


Thus  all  the  conditions  in  the  lemma  are  satisfied,  and  the 


counterexample  is  complete. 
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CHAPTER  3 

OPTIMAL  SEARCH  OF  A  MOVING  TARGET 

3.1  Introduction  and  Formulation 

Let  there  be  n  boxes.  A  target  Is  initially  in  box  i  with 
a  given  probability  ,  where  p^  >_  0,£  p^  *  1  .  Then  at  discrete 
tine  (or  stage)  t  «  1,2,  . ..,  it  moves  from  box  to  box.  If  at  time 
t  ,  the  target  is  in  box  i  ,  then  it  will  be  in  box  j  with 
probability  p^_.  at  time  t  +  1  where  T  =  [p^1  »  *  1  .1  j  Ji  n  » 

is  a  Markov  transition  probability  matrix. 

A  sequential  search  is  mac.e.  At  each  time  t  ,  a  decision  is  made 
as  to  which  box  to  search.  The  searching  process  continues  until  the 
target  is  found  or  until  one  decides  to  stop  when  there  is  a  stop 
option.  Searching  box  i  incurs  a  cost  c..  >  0  and  finds  the  target 
with  probability  if  the  target  is  in  box  i  at  that  time, 

(i.e.,  8^  *  1  -  is  the  overlook  probability  for  the  ith  box).  Let 

R  be  the  reward  earned  when  the  target  is  found.  This  is  needed  only 
when  there  is  a  stop  option. 

The  objective  is  to  maximize  the  probability  of  finding  the  target 
in  a  given  number  of  searches  or  to  minimize  the  expected  net  searching 
cost  (expected  searching  cost  minus  expected  reward). 

Let  state  3P  be  defined  as  the  vector  of  posterior  probabilities. 

F  =  (p^,  ...,  p  )  where  is  the  probability  that  the  target  is  in 

box  i  at  that  time. 

A  strategy  is  any  rule  for  determining  when  to  search,  and,  if  so, 

which  box.  It  is  a  sequence  5  -  (<5^,  ...,  6g)  where 

<5^  e  {1,2,  ...,  n)  for  i  *  1,  ...,  s  and  s  c  {0,1,2,  ...,  »}  .  The 

policy  <$  instructs  the  searcher  to  search  box  6^  at  the  ith  stage 
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and  to  stop  searching  if  the  object  has  not  been  found  after  the  sth 
search,  s  -  0  means  that  the  searcher  stops  immediately;  s  ■  • 
means  that  he  does  not  stop  until  he  finds  the  target.  When  there  is 
no  stop  option,  s  =  »  . 

For  any  strategy  6  ,  any  state  IP  and  any  integer  m  ,  define 
the  following  functions: 


fm(ff ,6)  =  the  probability  of  finding  the  target  in  m 
searches  when  F  is  the  initial  state  and 
strategy  6  is  employed. 

f°(6)  =  the  conditional  probability  of  finding  the 

target  given  that  the  target  is  initially  in 
box  i  and  strategy  6  is  employed. 
g(IP,<5)  =  the  expected  net  searching  cost  when  P  is  the 
initial  state  and  strategy  6  is  employed. 

fm(P  )  =  sup  £m(3P,5) 

6 

g(P  )  =  inf  g(IP  ,6) 

6 


Note  that  fm(P  ,6)  =  £  p . f™ ( -5)  . 

,  1  1 


Let  T\ TP  =  [ (T\ , P ) ^,  ...»  (T^IP)^]  ,  i  *  1,2,  ...,  n  ,  where 

(T.IP).  is  the  posterior  probability  that  the  target  is  in  box  j 
1  j 

at  the  next  stage  given  that  a  present  search  of  box  i  has  not  uncovered 


Let  Pi  =  jlP*,IP  •  •  •  r  TP  *  j  >  i  *  1,2,  ...,  n  ,  where  P  * 
is  the  posterior  probability  that  the  target  was  in  box  j  prior  to 


the  search  given  that  a  present  search  of  box  i  has  not  uncovered  it. 


Then 
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Tjff  *  P**T  .where  T  is  the  transition 
probability  matrix 


Ipjd  -  Vi5’1  (j  *  l) 

(1  -  ai)pi(l  -  a^r1  (j  -  i)  . 


Theorem  3.1; 

In  the  moving  target  model,  let  state  IP*  (p^,  ...»  pR)  be 
given.  Suppose  there  is  a  box  i  such  that 


Wik  i  v 


jrjpjk 


v  k  ,  j  . 


Then  to  maximize  the  probability  of  finding  the  target  in  m  searches, 
where  m  is  any  given  number,  an  optimal  strategy  first  searches  box 
i  . 

Proof: 

Let  6  be  any  strategy.  For  any  box  j  ,  let  S..6  be  the  strategy 
that  searches  first  box  j  then  follows  the  strategy  6  . 

Let  box  i  be  the  same  as  defined  in  the  theorem.  If  one  can  show 


fm(IP,Si6)  >  fm(3P  ,S^  6) 

for  any  j  ,  any  strategy  5  ,  then  the  theorem  is  proven.  Now 

f“(IP,S  6)  =  ajPj  +  (1  -  OjP^f^OP^.e) 

-  c^pj  +  (1  -  Pj)  l  (P  ;5T)kf®”1(6.) 
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Vi  +  £  [Jj  Vrk  +  (1  -  •J)VJk]«r1“> 
*  Vi  +  I  (j  'rV  -  WjkjC1^ 

■  |  wjkf1 '  £r1<6)] + 1  £  ‘“r'wr1^ 

s.  |  Vi'*ik[1  -  £"‘1(6)]  +  |  l  Vtkf’f'1 


f m0P  »S±6) 


Q.E.D. 


To  illustrate  the  use  of  the  above  theorem,  the  following 
facts  are  noticed. 


1.  If  p.^  **  1  ,  =  0  j  +  i  ,  then  an  optimal  strategy 

first  searches  box  i  . 

2.  If  p  ^  =  vjc  V  j  ,  then  an  optimal  strategy  first 

searches  a  box  with  max  an  .  This  is  the  no  information  case 

i  1 1 

and  will  be  treated  later  in  more  detail. 


3.2  Some  Special  Cases  of  the  Moving  Target  Mcdel 

In  this  section,  two  special  cases  of  the  model  will  be  exploited. 

Consider  first  the  case  where  p..  =  v.  V  i  .  This  is  the  case  where 

1 

no  information  is  gained  from  the  previous  stage.  Let  V  =  (vi»v2>  •••*  VR)  • 


Theorem  3.2: 

a)  To  maximize  the  probability  of  finding  the  target  in  m 

searches,  where  ra  is  any  given  positive  integer,  an  optimal  strategy 

searches  at  each  stage  the  box  with  max  a^p.  where  p^  is  the  posterior 

i 

probability  that  the  target  is  in  box  i  at  that  stage.  The  maximized 
fm(3P )  is 
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^(IP)  «  1  -  |l  -  max  ttjPij(l  ”  max  6ivi^tt  ^  * 

b)  To  minimize  the  expected  searching  cost  before  finding  the 
target*  an  optimal  strategy,  as  well  as  the  minimized  expected  searching 
cost  can  be  determined  by 


g(lP)  «  min  lc,  +  (1  -  a.p.)g(V)] 
i 


g(V)  *  min 

j  j  j 


(assuming  not  all  are  zero). 


Proof ; 

a)  Since  p^  *  V  i  ,  the  posterior  probability  vector 
for  the  next  stage  given  that  a  present  search  has  not  uncovered  is 
V  «  (v^,  . . . ,  v^)  .  Hence 


fm(IP)  =  max  ^ipi  +  (1  -  o^f^Oo] 

■  max  [“^<1  -  +  fm  1(V)  . 

Since  1  -  fm  *(V)  >0  ,  searching  the  box  with  max  o.p  is  optimal 

i  1  * 

and  part  (a)  is  proven, 
b) 


g(lP)  -  min  [ci  +  (1  -  o^JgCV)] 
g(V)  «  min  [c1  +  (1  -  a^JgCVjJ  . 


- 


To  minimize  g(V)  ,  one  can  simply  solve  the  second  equation  for 
different  i's  .  Thus 
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g(V)  *  min 
i 


a  v 

Vi 


Q.E.D. 

Consider  now  the  case  where  =  1  .  This  is  called  the  perfect 
detection  case.  The  following  theorem  applies. 


Theorem  3.3: 

In  the  moving  target  model,  assume  ou  =  1  .  To  maximize  the 
probability  of  finding  the  target  in  m  searches,  suppose  an  optimal 
strategy  first  searches  box  i  at  state  3P  =  (p^,  ...,  p^)  .  Then 
the  same  is  true  at  state  TP  1  »  (p|,  ...»  p^).  if  p|  >  p^,  ,  an(*  Pj 
is  proportionally  decreased  V  j  4  i  .  That  is. 


PjlPi 

!-pi 

p!  =  Ap.  where  0  <  A  *  -z -  <  1  . 

j  j  ~  1  “  P±  “ 


Proof : 

For  any  strategy  6  ,  any  box  j  ,  let  6  be  the  strategy  that 
first  searches  box  j  and  then  follows  strategy  <5  .  Then,  for  any 
state  IP  ,  conditioning  on  the  initial  location  of  the  target  yields 

f“(IP  ,S  6)  -  p  +  l  p.f“"1(5)  . 

J  J  k*}  k  k 

Notice  that  if  the  target  is  not  in  box  j  initially,  then  the 
probability  of  finding  it  depends  on  5  only. 


Let  box  i  be  the  same  as  given  in  the  theorem.  By  assumption, 


the  strategy  that  searches  box  i  first  and  then  follows  an  optimal 
* 

strategy,  say  6  ,  will  be  optimal  for  state  P  ,  i.e., 

fra(P)  »  f“(p  , S±6*)  . 

Let  box  k  be  any  box  k  f  i  .  At  state  P  ’  as  given  in  the 
theorem,  let  be  the  strategy  that  searches  box  k  first  and  then 

follows  an  optimal  strategy  6'  .  If  one  can  prove 

f“(ff  ,S16*)  -  fn(IP,,Sk6‘)  >  0  , 

then  there  exists  an  optimal  strategy  which  first  searches  box  i  at 
state  P  ’  .  Now 

fm(lP,,Si6*)  -  f*(P  \Sk«') 

>  f“(p,,Si6*)  -  fm(IP  '  ,Sk6')  -  x[f°(p,Si6*)  -  fm(P  ,Sk6»)J 

*  f^P’.S.e*)  -  Xfra(ff  ,S16*)  -  [fm(P\Sk6’)  -  XfmdP,Sk6')] 

-  P  [+  l  -  Xp  -  X  l  p  f?-1(6*) 

j*i  3  J  1  j^i  i  2 

-  + pi£rv>  -  -  *  }i 

•  Pi  -  *Pi  -  [p’f^Cfi’)  -  xpif^1(«,>] 

=  <pj  -  XPi)[l  -  f“_1(6’)] 


Q.E.D. 
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Pollock  (1970)  analyzed  the  model  with  two  boxes  for  the  above  two 

cases.  He  characterized  the  optimal  strategy  as  a  function  of  the 

initial  probability  distribution  as  follows.  If  p  and  1  -  p  are 

respectively  the  probability  that  the  target  is  in  box  1  and  box  2, 

* 

then  an  optimal  strategy  is  to  search  box  1  when  p  p  and  search 
*  * 

box  2  when  p  <_  p  .  p  can  be  explicitly  computed.  For  the  general 

case  of  the  two  box  model  (i.e.,  with  no  restriction  on  either  or 

* 

T),  he  claimed  a  similar  result  holds  but  that  p  remains  to  be 
determined.  He  gave  an  incorrect  proof.  The  proof  was  based  on  the 
implicit  assertion  that  two  convex  functions  f^  ,  f ^  on  the  real 
interval  [0,1]  intersect  at  only  one  point  if  f^(0)  >  £2(0)  , 
f^(l)  <  f2(l)  •  This  is  clearly  wrong.  Consider  the  following  two 
functions  f^  and  f2  on  the  real  line.  They  satisfy  the  above 
conditions  but  they  may  intersect  at  any  odd  number  of  points. 


0  1 


Comment:  In  the  general  case  of  the  two  box  model,  it  is  unknown  yet 
whether  an  optimal  strategy  as  a  function  of  p  has  only  two  regions. 
In  the  rest  of  this  chapter,  some  more  special  cases  will  be 


studied. 
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3.3  The  Moving  Target  Model  with  Some  Transition  Probability 
Matrices  Related  to  a  Jordan  Matrix 

A  matrix  is  a  Jordan  matrix  if  there  exists  exactly  one  1  in 
each  row  and  each  column,  whereas  all  the  rest  of  the  elements  are 
zero. 

In  the  moving  target  model,  if  T  is  a  Jordan  matrix,  it  means 
the  following.  At  time  t  ,  if  the  target  is  in  box  i  ,  then  at  time 

t  +  1  it  moves  to  box  h(i)  with  probability  one,  i  =  1 . n  , 

h(i)  *=  1,2,  ...,  n  .  But  h(i)  ^  h(j)  for  i  i  j  .  If,  in  addition, 

=  o  V  i  ,  then  after  every  search,  it  appears  as  though  the  target 
is  stationary  but  that  the  boxes  are  renumbered.  Note  that  if  T  *  I  , 
the  identity  matrix,  then  the  target  is  stationary. 

For  any  strategy  6  ,  any  state  IP  ,  any  transition  probability 
matrix  T  ,  and  any  integer  m  ,  let 

gm(IP,T;6)  =  the  probability  of  finding  the  target 
in  m  searches  when  T  is  the 
transition  probability  matrix  for  the 
model,  IP  is  the  initial  state  and 
strategy  6  is  employed. 


gm(F  ,T)  =  sup  gm(lP  ,T;  6)  . 
6 


Let 


g™ (T ; 6)  «  the  conditional  probability  of  finding 
the  target  in  m  searches,  given  that 
the  target  is  in  box  i  initially,  the 
transition  probability  matrix  is  T  and 
strategy  6  is  employed. 

Then 

gm(IP  ,T ;  6)  =  l  Pig®(T;6)  . 

Theorem  3  Jr. 

Suppose  T  is  a  Jordan  matrix,  ai  H  o  ,  Then,  to  maximize  the 

probability  of  finding  the  target  in  N  searches,  where  N  is  any 

given  number,  an  optimal  strategy  searches  the  box  with  max  p.  each 

i  1 

time  where  p^  is  the  posterior  probability  that  the  target  is  in 
box  i  at  that  time.  Also 


gN(lP  »T)  »  gNOP  ,1) 

where  I  is  identity  matrix,  i.e.,  when 


the  maximized  probability  of  finding  the  target  in  N  searches  when 
T  is  a  Jordan  matrix  is  the  same  as  when  the  target  is  stationary. 

To  prove  the  theorem,  induction  will  be  used.  N  =  1  is  trivial. 
It  will  be  verified  that  if  the  theorem  holds  for  N  =  m  -  1  then  it 


holds  for  N  =  m  as  well.  Now 
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g*<3P  ,T)  =  max  aPl  +  (1  -  ap^g®"1^  ,T)  . 

By  induction  hypothesis 

gm_1(Tiff  ,T)  =  gm”1(TilP  ,1)  . 

By  definition,  T^P  =  IP  *T  .  Hence 

gm“1(Tiff  ,1)  -  gm“1(IPiT,I)  . 

g  (IP  T,I)  corresponds  to  the  probability  function  for  a 
m  -  1  stage  search  model  where  the  target  is  stationary,  =  a  , 
and  the  initial  state  is  IP  *T  .  IP  *  multiplied  by  T  means 
nothing  but  a  renumbering  of  the  boxes  where  all  the  boxes  are 
identical  (o^  =  a)  .  Therefore, 

gm"1(IP1T,I)  =  g^qp1,!) 


and 


gm(IP  ,T)  «  maxjoPi  +  (1  -  opi)g®~1(P  *,I)J  . 

A  look  at  the  definition  of  IP*  shows  that  the  right-hand  side 
is  just  the  formulation  for  a  m  stage  stationary  target  model. 

Hence  searching  a  box  with  max  p.  is  optimal  and 

J  * 


gm(IP  ,T)  «  gm(IP  ,1) 


Q.E 
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theorem  3.5: 

For  any  integer  m  ,  any  prior  probability  vector  TP  ,  and  any 
transition  probability  T  ,  g°(lP  ,T)  is  a  convex  function  of  I*  . 

Proof : 

1/1  1\  2  /  2  2\ 

Let  IP  =  (p^,  p|  IP  =  fp^,  p nj  be  any  two  prior 

probability  vectors.  Let  X  be  any  number  0  <_  X  <_  1  .  Then 

gm[XIP1  +  (1  -  X)IP2,T] 

*  sup  gm[Xff  1  +  (1  -  X)IP  2,T; <5] 

6 

«  sup  l  [XT1  +  (1  -  X)lP2].g“(T,6) 

1  1  sup  l  p*g“(T,<5)  +  (1  -  X)  sup  l  p2g®/T,6) 

6  i  6 

«  X  g^ClP  1,T)  +  (1  -  X)gm(P2,T)  . 

Hence,  gm(P,  T)  is  a  convex  function  of  ff . 

Q.E.D. 

The  following  theorem  gives  an  upper  bound  for  the  maximum  probability 
of  finding  the  target  in  N  searches  for  a  large  class  of  transition 
probability  matrices  when  =  a  . 

Theorem  3.6: 

Let  T  be  a  convex  combination  of  Jordan  matrices.  Assume 
N  N 

i  =  a  .  Then  g  (3P  ,T)  <_  g  (IP  ,1)  where  I  is  the  identity  matrix. 

E 

l 

' 

t 

t 

r 
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Proof : 

Induction  will  be  used.  When  N  ■  1  ,  g^(IP  ,T)  =  g^(IP  ,1)  » 

max  op.  .  The  theorem  clearly  holds, 
i 

It  will  be  verified  that  if  the  theorem  holds  for  N  *  o  -  1  , 
then  it  holds  for  N  *  m  as  well.  Now 

g ®(3P  ,T)  =  max  oPjL  +  (1  -  ap^g^CT^  ,T)  . 

By  assumption,  T  is  a  convex  combination  of  Jordan  matrices. 

r  i  i 

Hence,  T  can  be  written  as  T  -  l  a^Q  where  Q  ,  i  =  1,2,  ... 
are  Jordan  matrices  and  a^  ,  i  =  1,2,  ...  are  such  that  a^  j>  0,£  a^  *=  1 
By  induction  hypothesis, 

gm_1(Tiff  ,T)  <  ,1)  . 


By  definition, 


m-1 

g  (T, 


P  ,D  - 


gm"1(ff  *1,1) 


gn‘*1(TilP  ,1) 
theorem.  Hence 


is  a  convex  function  of  T^IP  ,  by  the  preceding 


m-l,™  im  m-l/_i  r  .k  ,\ 

g  (IP  T ,1)  =  g  |IP  1  akQ  ,lj 


akg°"1(IPiQk,I) 


Since 


is  a  Jordan  matrix 
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TX  m-l,_  i 

g  (IP  Q  ,1)  -  g  (IP  ,1) 

by  the  same  arguments  as  used  in  the  proof  of  Theorem  3.4.  Therefore, 

gm_1(T.IP,I)  <  g^OP1,!) 
and 

gm(IP  ,T)  <  max  j^ap±  +  (1  -  ctpi)gm_1(ff  i,I)j  . 

But  the  right-hand  side  is  just  the  maximized  probability  of  a  m 
stage  stationary  target  model.  Hence 


Consider  a  problem  where  one  may  decide  to  stop  before  finding  the 
target.  Assume  c..  =  c  for  all  i  .  Let  R  be  the  reward  earned  when 
the  target  is  found.  The  objective  is  to  maximize  the  expected  net 
return  (i.e.,  expected  reward  minus  expected  searching  cost).  Call 
this  problem  (C)  .  The  following  theorem  applies. 


Theorem  3.7; 

Let  IP=  (p^,  ...,  pn)  be  posterior  probability  vector.  Suppose 

in  the  problem  of  maximizing  the  probability  of  finding  the  target  in 

a  given  number  of  searches,  an  optimal  strategy  searches  at  each  time  a 

box  with  max  a  p  .  Then  in  problem  (C)  an  optimal  strategy  either 
i  11 

first  searcher;  a  box  with  max  a.p  or  else  stops.  Note  that  this  applies 

i  1 
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to  the  no  information  case  in  Theorem  3.2,  the  case  in  Theorem  3.4 
and  the  stationary  target  case. 

Proof: 

Let  <S  be  an  optimal  strategy.  Its  existence  can  be  proven  as 
in  Foss's  paper  [6].  Let  s  be  the  time  at  which  the  searcher  stops 
if  the  target  has  not  been  found  after  the  sth  search,  s  is 
deterministic  and  may  be  infinite.  Let  Nq  be  the  time  at  which  the 
target  is  found.  Nq  =  <*»  if  the  target  is  not  found.  Nq  is  a  random 
variable. 

For  any  positive  integer  i  s  ,  let  P{(N0  <  i)  ,  Pg(NQ  *  i) 
be  respectively  the  probabilities  that  the  target  is  found  before  and 
at  the  ith  search  when  strategy  6  is  employed.  Pg(NQ  >.  i)  is  the 
probability  that  the  target  is  not  found  in  the  first  i  -  1  searches 
by  using  strategy  6  . 

If  s  »  «  ,  then  since  6  is  optimal,  the  target  is  found  with 
probability  1.  Otherwise,  the  expected  searching  cost  will  be  infinity 
and  6  cannot  be  optimal.  Thus,  if  s  =  »  ,  the  expected  net  reward 
18 


R*P  r  (N  <  «) 

6  O 


CO 


c-  l 

k-1 


k) 


**«•[ 


k-1 


6  o 


>  k) 


If  0  <  8  <  «  ,  the  expected  net  reward  is 
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/ 


R*P  (N  <  s) 
o  — 


's-1 

c  l  kP5(N0 
_k*l 


k)  +  S‘P  (N 

o 


8 

-  R[1  -  P6(No  >_  s  +  1)]  -  c-  l  P6(No  >  k)  . 

k«l 


In  either  case,  if  s  >  0  (i.e.,  not  stopping  immediately),  the 

strategy  of  searching  a  box  with  max  a.p.  will,  by  assumption, 

i  11 

minimize  I’  [N  >  k]  V  k  .  Hence,  an  optimal  strategy  for  problem  (C) 
oo  — 

either  first  searches  a  box  with  max  a.p  or  else  stops. 

i 

3.5  A  Special  Case  with  Two  Boxes 

Consider  a  rather  special  case  of  the  moving  target  model.  Assume 
there  are  two  boxes  and  a^  =  =  a  .  The  objective  is  to  maximize  the 

probability  of  finding  the  target  in  N  searches,  where  N  is  any 
given  number.  Given  initial  state  IP=  (p^p^)  .  T1  is  the  symmetric 
transition  probability  matrix  such  that 


b  1 


Lb 


where  a  ,  b  >0,a  +  b  =  l 


An  optimal  strategy  for  the  above  problem  will  be  characterized 
in  the  following  theorems. 


Theorem  3.8: 

An  optimal  strategy  for  the  above  problem  first  searches  the  same 
box  as  a  similar  problem  with  =  a^  =  a  but  with  a  transposed 
transition  probability  matrix 


T2  = 


b  a 

a  b 


where  a  and  b  are  the  same  as  in  T 


28 


Also  Che  maximum  probabilities  of  finding  Che  target  are  equal  for 
the  two  problems,  i.e.,  gN(P  ,T^)  *  g1* (IP  ,T^)  . 

Remark:- 

This  theorem  implies  that  one  can  assume  a  b  in  in 
deciding  which  box  to  search  first. 

Proof : 

Induction  will  be  used.  N  *  1  is  trivial.  It  will  be  verified 
that  if  the  theorem  holds  for  N  =  m  -  1  then  it  holds  for  N  *  m 
as  well.  Now  for  k  =  1,2 

g®(lP,Tk)  =  max  jotpi  +  (1  -  p^g®"1^  1Tk.Tk>] 

where  IP*  =  as  before.  Recall  that 

iPjU  -  V^'1  (j  +  i) 

(!  -  c^Jp^l  -  “iPi)”1  (j  =  i)  • 


By  induction  hypothesis 


m-l.—  i^l  _1.  m-l,_i_l  _2. 
g  (IP  T  ,T  )  =  g  (IP  T  ,T  )  . 


By  definition  =  T*Q  where  Q  = 


0  1 
LI  OJ 


.  Multiplying  the 


posterior  probability  by  Q  means  a  renumbering  of  the  boxes.  The 
two  boxes  have  the  same  parameters,  namely,  the  same  overlook  probability 
as  well  as  the  same  probability  of  moving  to  the  other  box  after  one 
search.  Therefore,  by  symmetry 
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gm"1(IP1T1Q,T1)  =  gm"1(lP1T1,T1) 


It  follows  that 


gm‘1<lPiT2,T2)  «  gm_1(r  iT2,T1)  *  gm'1(lP  iT1,T1)  . 


Hence,  by  the  above  dynamic  programming  formulation,  an  optimal  strategy 

in  X  m  2 

for  both  problems  first  searches  the  same  box  and  g  (P  ,TA)  *  g  (IP  ,x  ) 


Q.E.D. 


Theorem  3.9: 


1  .  a 


1  .  b 


If  ]P  is  such  that  —  >  -r  or  —  <  —  where  a  >  b  >  0  as 

p2  -  b  p2  -  a  -  - 

before  then  an  optimal  strategy  first  searches  the  box  with  larger  p^^  . 


Proof: 


Since  pu  *  P22  “  a  ,  Pi?  ■  P9i  “  b 


12  *21 


1  .  a 


r>b=>  aplp12  =  aplb  -  ap2a  =  °p2p22 


Also 


1  .  a 


-  >  -  >  i  ->  aplPu  =  „Pla  >  cpjb  -  ap2p21  . 


Hence  by  Theorem  3.1,  an  optimal  strategy  first  searches  box  1  (i.e., 


1  .  b 


the  box  with  larger  p^) .  By  symmetry,  the  case  —  <_—  is  the  same 


Q.E.D. 
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Theorem  3.10: 

Assume  a  ^  b  in  T*  and  state  P  is  such  that  1  -  a  <_ 
p^/p2  <_  X  -  a  .  Then,  to  maximize  the  probability  of  finding  the 
target  in  N  searches,  where  N  is  any  given  integer,  an  optimal 
strategy  is  as  follows.  It  first  searches  a  box  with  larger  p^  , 
and  then  keep  on  switching  to  the  other  box  after  every  search. 

Proof : 

The  transition  probability  matrix  is 


*a  b- 

•1  o- 

~h 

T  -  T1  * 

-  (a  -  b) 

+  2b 

_b  a. 

1 

H 

O 

-h  h- 

«  +  ^2^2  w^ere  =  a  -  b  0  ,  q£  ^  2b  0  , 


-i  o- 

‘H  k~ 

Ai  = 

"  1  »  A2  = 

.0  1- 

-h  h  - 

One  can  consider  the  target  moves  after  every  search  in  the 
following  way.  With  probability  q^  it  moves  according  to 
(i.e.,  it  stays).  With  probability  q^  =  1  -  q^  ,  it  moves  according 
to  .  Physically,  this  doesn’t  occur.  But  one  may  think  of  the 

bail  moves  in  the  above  fashion  even  after  the  ball  is  found. 

Let  r  be  the  first  time  it  moves  according  to  ,  1  r  N  . 
r  »  1  means  that  after  one  search  it  moves  according  to  A^  for  the 
first  time,  r  =  N  means  it  always  moves  according  to  A^  .  r  is 
independent  of  the  strategy. 

Clearly,  r  is  a  random  variable  with  the  following  distribution. 
Let  t  be  an  integer  1  <_  t  <  Dl  .  Then 
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r  =  t  with  probability  q^q2  1  <.  t  N  -  1 

r  =  t  with  probability  q^  if  t  *  N  . 


1  2 

Let  strategies  6,6  be  defined  as  follows. 


6=1212... 


6=2121...  . 


1  2 

That  is,  6  and  6  are  alternating  searching  sequences  before 
the  process  ends. 


Assume 


first  ^  p2  *  Then  the  theorem  says  that  6  is 


optimal  if  1  “  a  ~  •  The  case  p2  >  p^  is  similar  and 

omitted.  To  prove  the  theorem,  induction  will  be  used.  N  =  1  is 
trivial.  It  will  be  verified  that  if  the  theorem  holds  for  N  <_  m  -  1  , 
then  it  holds  for  N  *  m  as  well. 

Recall  that  for  any  positive  integer  m  ,  gm(P  ,T;6)  is  the 
probability  of  finding  the  target  in  m  searches  when  IP  is  the 
state,  T  is  the  transition  probability  vector  and  strategy  6  is 
employed . 

Let  V  be  the  state  (h,h)  •  I-et  6^  be  the  truncated  6^ 
after  truncating  the  partial  sequence  for  the  first  r  stages. 
Conditioning  on  r  yields: 

gB(ff  .Tjfi1)  -  E{gr(lP>A1;61)  +  [l  -  gr(ff  .Ajji^g^Ol.Tij0)) 

=  1  -  e[i  -  gr(lP  ,A1;61)J[1  -  gm_r(V,T;6°)  ]  . 


The  above  formulation  can  be  explained  as  follows.  During  the  first 
r  stages,  the  target  moves  according  to  .  Therefore,  the  probability 
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of  finding  the  target  in  the  first  r  searches  is  gr(ff  ,A^,6^j  .  At 

the  end  of  the  rth  search,  if  the  target  is  not  found,  then  it  moves 

according  to  A^  .  Hence  the  state  for  the  r  +  1st  stage  is 

V  *  Oith)  >  and  the  strategy  is  6®  ,  the  truncated  6*  .  Notice  that 

0  12 

before  the  process  terminates,  6  is  either  6  or  6  depending  on 
whether  r  is  even  or  odd. 

Since  A^  =  I  ,  ,  by  the  results  for  a  stationary  target 

model,  an  optimal  strategy  first  searches  box  1  to  maximize  g  (IP  .A^.6)  , 
where  6  is  any  strategy.  Since  (1  -  cOp^  ±  »  an  optimal  strategy 

next  searches  box  2.  But  “  a  implies  that  after  searching 

twice  without  finding  it,  the  state  becomes  IP  again  for  the  third 
stage.  Therefore,  repeating  the  above  arguments  shows  that  6*  is 

■j* 

optimal  for  g  (IP.A^d)  for  any  r  . 

Now  r  >_  1  and  V  =  (h*H)  *  (vi»v2^  satisfies 
l-o  £vt/v2  <_  1/1  -  a  .  Since  v^  ■  *  h  ,  by  induction  hypothesis, 

gm  r(V,T;6)  is  maximized  by  either  5^  and  6^  .  Hence  6^  maximizes 
gn-r(V,T ,6)  for  any  r  .  It  follows  that  6*  maximizes  gm(P  ,T,6) 
and  the  theorem  is  proven, 

Q.E.D. 

Theorem  3.11: 

a  1 

If  1  <  ^  <.  ------  »  then  for  any  state  IP  ,  an  optimal  strategy 

first  searches  a  box  with  larger  . 


Proof : 


?1  a  ?i  b 

Theorem  3.9  says  that  if  —  >  r-  or  if  —  <  —  ,  then  an  optimal 

P2  -  b  p2  -  a  r 


strategy  first  searches  a  box  with  larger  p^ 


Theorem  3.10  says  that 
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Pi 

if  1  -  a  <  —  < 
-P2“ 

with  larger 


then  an  optimal  strategy  first  searches  a  box 


a  1  Pi 

Now  if  r  <  •: -  ,  then  — 

b  -  I-  "  “  P2 

Theorem  3.9  or  Theorem  10.  Hence 
a  box  with  larger  p^  . 


satisfies 
an  optimal 


the  condition  of  either 
strategy  first  searches 


Remark: 


a 


Unfortunately,  when  — 


1  -  a 


and 


- -  <  —  <  r*  ,  the  optimal 

1  -  a  —  p„  —  b  *  v 


strategy  is  not  characterized  by  the  preceding  theorems. 
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CHAPTER  4 

OPTIMAL  SEARCH  WITH  RANDOM  OVERLOOK  PROBABILITIES 

t  * 

4.1  Introduction 

Consider  an  optimal  search  problem  with  n  boxes.  Let  p^,  be 

the  given  prior  probability  that  the  target  is  in  box  i  , 
n 

p.  ^  0  l  p.  =  1  .  The  target  is  stationary.  The  overlook  probabilities, 
1  i=l  1 

however,  are  allowed  to  be  random  variables.  Thus,  searching  box  i 

at  time  t  finds  the  target  with  probability  >  0  if  the  target  is 

1  2 

in  box  i  .  For  fixed  i  ,  a^,a^,  •••  are  independent  identically 
distributed  random  variables.  The  a^'s  are  told  at  time  t  either 

l 

before  the  search  or  after  the  search  as  will  be  treated  separately 
in  the  sections  that  follow. 

4.2  Random  Overlook  Probabilities  Told  After  the  Search 

The  main  difference  between  this  case  and  the  model  with  deterministic 

random  overlook  probabilities  is  as  follows.  The  posterior  probabilities 

after  a  search  is  made  without  finding  it  are  random  variables.  It 

follows  that  a  strategy  is  usually  not  a  fixed  sequence  of  searching. 

In  fact,  the  decision  for  time  t  +  1  is  not  made  until  a^'s  are 

told  which  occurs  after  a  search  is  completed  at  time  t  .  A  strategy 

6  ,  therefore,  is  any  rule  for  determining  which  box  to  search  at  each 

time  t  .  The  rule  depends  on  the  posterior  probability  distribution  at 

that  time.  Since  for  fixed  i  ,  has  the  same  distribution  for  all 

t  ,  EUj  shall  be  written  as  Ea.  . 

*  i  i 

Theorem  4.1: 

Let  N  be  any  integer,  IP  =  (p^,P2>  . ..,  Pn)  be  the  prior 
probabilities.  To  maximize  the  probability  of  finding  the  target  in  N 
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/ 


searches,  an  optimal  strategy  first  searches  the  box  with  max  p.Ea.  . 

i  1  1 

Proof: 

Let  N  be  the  allowed  number  of  searches. 

Using  induction,  N  =  1  is  trivial.  It  will  be  verified  that  if 
the  theorem  holds  for  N  -  m  -  1  ,  then  it  holds  for  N  =  m  . 

In  a  m-stage  problem,  when  the  initial  state  is  IP  =  (p^,  ...,  pR)  , 
let  box  j  be  such  that  Pj^aj  =  max  P^Ea^  *  Suppose  an  optimal 

strategy  first  searches  box  k  ,  p  Ect,  f  max  p.Ea.  .  After  the  search, 

K  K.  IX 

if  the  target  is  not,  found,  then  the  posterior  probability  distribution 
IP*  =  (p|,p2>  ...»  p^)  as  a  function  of  the  told  a^’s  ,  is 


(1  -  ypk 
1  '  Vk 


i  i  k  . 


IP*  is  a  random  vector  since  a.  is  a  random  variable.  Moreover, 

k 

IP  '  can  be  considered  as  the  initial  state  for  the  remaining  m  -  1 
stage  problem.  Now 

PjEaj  i  PjE^  i  +  i 


V  i  t  k  . 


Hence  p’.Ea.  >  p!Ea  V  i  .  This  i-?  true  regardless  of  the  value  of 
3  3  “  1  1 

the  a_/s  • 

By  induction  hypothesis,  it  will  be  optimal  then  to  search  box  j 
first  for  the  remaining  m  -  1  stage  problem.  Thus  an  optimal  strategy 


for  the  original  m  stage  problem  is  to  search  first  box  k  , 

ft 

then  box  j  then  continue  optimally  by  following  6  ,  say.  Let 

£ 

the  whole  strategy  be  denoted  by  S,S,6 

*  J 

For  any  strategy  6  and  any  prior  probability  vector  TP  ,  let 
in 

f  (IP  ,6)  «=  the  probability  of  finding  the  ball  in  m  searches  when 
]P  is  the  vector  of  prior  probabilities  and  strategy  6  is  employed. 


If  one  can  show 


lm(w  .SjSk<$*j  -  fm|lP  .SkSj6*j  >_  0 


then  there  exists  an  optimal  strategy  which  first  searches  box  j  where 

p.Ea.  =  max  a  p  .  Let  the  a's  for  the  second  stage  be  o'  .  Let 
J  J  ^  ^  ^  * 

TI.p  be  the  posterior  probabilities  after  searching  first  box  j  then 
J  K- 

box  k  without  finding  the  target.  Then 


£”(® -v/) -'”(*■  v/)- 

PjE«.  +  pkE«'  +  Ed  -  -  »|;Pk)f"-2(TjTkIP,{*)  - 

"kE\  -  -  E»  '  Vk>(1  '  “i>'j)f”'2(TkTjlp  ’6‘) 

where  T^IP  =  "(1  ^^TpTTa  -  a^)  (pl»  (1  **  aj)pj’  (1  "  “k 

• • • »  Pn)  • 


By  assumption,  for  any  i  ,  and  have  the  same  distribution. 

Hence  WlP  ,S.S,  o*')  -  fm(lP  ,S,  S,  <$*)  =  0  . 
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Assume  searching  box  i  costs  c^  ,  0  <  c^  <  *  .  The  problem  of 
minimizing  the  expected  searching  cost  will  now  be  treated.  For  any 
strategy  •<$  and  any  prior  probability  vector  IP  ,  let  gm(IP  ,6)  **  m 
stage  expected  searching  cost  when  IP  is  the  vector  or  prior 
probabilities  and  strategy  6  is  employed. 

Similarly,  Vc  g(IP  ,6)  =  total  expected  searching  cost  when  IP 
is  the  vector  of  prior  probabilities  and  strategy  6  is  employed. 

Define 


gm(IP  )  =  inf^  gm(ff  ,6) 
g(P  )  =  inf ^  g(IP  ,6)  . 

Lemma  4.2: 

Let  min  c.  --  c  >  0  ,  Ea.  >  0  ,  and  max  c.  =  k  <  «  .  Then 
i  1  i 

g(ff )  <  c*k 1 1 1  =  M(say)  . 

Proof : 

Let  61  be  the  strategy  of  always  searching  the  box  with 

piEai  * 

max  -  at  any  time  t  .  For  any  strategy  6  ,  let  N  (6)  be  the 

i  ci 

random  time  at  which  the  target  is  found  by  using  strategy  6  . 

N  (5)  =  00  ,  if  the  target  is  never  found.  Then 

g(P)  <  g(F  ,6X)  <_  k’EN* (o1) 

EN  (61)  =  l  P(N*(61)  >  m) 
m=Q 


38 


*  1 


P(N  (6  )  >  m)  «  P  /not  finding  the  target 

I  in  m  searches  by  using] 


V 


strategy  5 


PjP° 


The  minimal  value  of  max  — — —  is  achieved  by  the  vector  having 


i  Ci 


plE“l  p2E“2 


p  Eo 
rm  m 


m 


1  r<“'j 

Now  each  time  6  searches  a  box  with  max  -  .  Thus,  each  time  <5 


i  Ci 


searches  a  box  (say  box  j),  the  probability  p  Ea  that  the  target 


will  be  found  is  such  that 


pjEoj  -  cj, 


-  clil  E»i)  ‘ 


Hence 


P(N 


1  [*  ■  ii 


m 


EnV1)  •  l  P(S "(51)  >  m)  <  c Hl-^r 
m-0  “/  liE“i 


^  1 


g(3P)  <k-EN*(6X)  <0.^  . 


Q  •  E  «  D . 


Theorem  4.2: 


Let  g“(ff)  ,  g(F)  be  defined  as  before,  then 


m , 


lira  g  (IP)  =  g(IP)  . 
m-*=° 


J 
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Proof : 

Let  <5m  be  a  strategy  which  minimizes  the  m  stage  expected 
searching  cost.  Let  gm(IP,6)  be  the  m  stage  expected  cost  and  let 
g(lP  ,<$)  be  the  total  expected  searching  cost  defined  as  before.  gm(ff ,6) 
is  a  monotone  increasing  function  of  m  .  The  same  is  true  for  gm(IP)  . 
Since  gm(IP)  <_  g(P)  £  M  (a  constant),  gm(IP)  converges  in  m  .  Let 
Pm(6)  be  the  probability  of  finding  the  target  in  m  searches  by 
following  strategy  6  .  Then 

gdP,6m)  =  gm(IP,6m)  +  [1  -  Pm(6m)]g(Traff) 

where  TmlP  is  the  posterior  probability  after  using  610  for  m  stages 
without  finding  the  target.  Now  M  >_  g(TmIP)  and  Pm(6m)  -*■  1  (otherwise 
gm(IP)  -*  ~)  .  Hence,  g(lP  ,<$“)  ■+  gm(ff  ,6m)  =  gm(!P  )  .  Suppose 
gm(ff)  -*■  K  <  g(IP)  .  Then  for  N  large  enough,  g(IP,6N)  <  g(IP)  which 
is  a  contradiction.  Hence,  lim  gm(]P)  *  g(]P)  . 

Q.E.D. 


Theorem  4.3: 


Let  IP  =  (p^,  p^)  be  the  state  at  a  certain  time.  To 

minimize  the  expected  searching  cost,  an  optimal  strategy  first  searches 

PiE“i 

a  box  with  max  -  . 


c 


i 


Proof: 

The  proof  will  be  carried  out  by  considering  an  ra  stage  searching 
process  and  then  let  m  go  to  infinity.  Let  initial  state  IP  be  as 


AO 


given.  Let  m  be  any  positive  integer, 
process.  Let  box  j  be  a  box  with  max 


Consider  an  a  stage  searching 
piEoi 


Define  the  following  strategies: 


6^  =  the  strategy  which  minimizes  the  m  stage  expected  searching 
cost  given  that  it  searches  box  j  at  the  mth  stage. 

2 

6  *=  an  optimal  strategy  which  minimizes  the  m  stage  expected 
searching  cost. 

3 

6  »  the  strategy  which  first  searches  box  j  and  then  continues 
optimally. 

For  any  strategy  6  ,  let  g m(P  ,6)  be  defined  as  the  m  stage 

expected  cost  and  let  gra(IP)  =  inf  gm(IP,6)  as  before.  Also  let  g(3P  ) 

6 

be  the  minimum  expected  cost  before  finding  the  target  (m  =  «)  . 

By  Theorem  4.2,  for  any  state  IP  , 

gm(IP)  -  g(IP)  . 


Recall  that  is  the  posterior  probability  for  the  next  stage 

after  searching  box  i  without  finding  the  target.  Then  by  definition. 


gm(ff  ,63)  -  c  +  E (1  -  a  p  )gm_1(T.3P  ) 

J  J  J  w 


gm(p  ,S3)  Cj  +  E (1  -  a^p_j)g(T^.P  )  ss  m  +  »  . 


Now  by  dynamic  programming, 

g(JP  )  *  min  cA  +  E(1  -  csipi)g(TiP)  . 
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IQ  3 

Hence  if  one  can  show  g  (IP  ,6  )  g(lP)  then  an  optimal  strategy 

first  searches  box  j  for  an  infinite  stage  process. 

fll  3 

In  order  to  show  g  (IP  ,6  )  g(IP)  ,  it  suffices  to  prove  the 
following  two  parts: 

(a)  gmop,63)  <  g°op ,n  . 

(b)  As  m  +  «  ,  gm(IP  , 61)  +  g°(IP  ,62)  +  g(lP)  . 

lo  prove  (a),  induction  will  be  used.  When  m  =  1  , 
gm(ff  ,63)  =  gm(JP,6^)  =  c^  ,  (a)  is  trivially  true.  It  will  be  verified 
that  if  (a)  holds  for  m  =  r  -  1  for  any  P  ,  then  it  holds  for  m  *  r 
as  well. 

iq  3  m  1 

Loosely  speaking,  g  (IP  ,6  )  g  (IP, 6  )  means  that  if  box  j  has 
PiE<Xi 

max  -  then  searching  box  j  first  is  no  worse  than  searching  box  j 

Ci 

last,  when  optimal  decision  is  made  at  the  other  stages. 

Suppose  when  m  =  r  ,  61  searches  box  k  first.  If  k  »  j  ,  the 

case  is  trivial.  So  assume  k  f  j  .  After  the  first  search,  if  the 

target  is  not  found,  let  the  posterior  probability  be 

IP’  =  (p^p^,  ....  p^)  •  Let  be  defined  as  before.  To  simplify 

1  2 

notation,  let  ,  i  =  1,  ...»  n  .  Then,  for  any  i 
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r  -  1  stages.  By  induction  hypothesis,  searching  box  j  next  is 

no  worse  than  searching  box  j  last. 

* 

Let  SkS^d  be  the  strategy  that  searches  first  box  k  ,  then 

box  j  then  follows  an  optimal  strategy  6*  .  Then  S.S.6*  is  no 
1  k  j 

worse  than  6  for  the  r  stage  process,  i.e., 


gr(ff  ,SkSj6*)  -  §r(3P  »dl>  * 

* 

Let  SjSk6  be  similarly  defined.  If  one  can  show 


gt(p  ’W*)  -  gr(]P*sksj<*) 


then  8r0P  ,63)  <  8r(lP  ,S.S/)  <  ,*(»  ,SkSjS‘)  «  g'0P  ,{1,  and  w 
will  be  proven.  Now 


*r(F  *sjV‘)  -  *r(* -V/)  * 

Cj  +  °  “  PjE“j)ck  +  E(1  "  VjXI  -  )  - 

ck  "  (1 "  pkE\)cj  “ E(1  -  Vk)(1  ■  Qjpj)gr~2(:rkTjIp) 

where  T^IP  is  the  posterior  probability  after  searching  first  box 
j  then  box  k  without  finding  the  target.  Following  the  arguments  as 
used  in  the  proof  of  Theorem  4.1  yields 


-  Wcx +  Wcj  - 

/pkEak  piEa,\ 

c^nf  "Xl-0- 
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Hence  (a)  is  proven. 

The  (b)  part  is  repeated  here  to  be  proven,  g  (1P,6A)  g  (P  ,5  ;  ->• 
g(P  )  as  m  -*•  »  .  For  any  strategy  6  ’,  any  integer  N  ,  let 
f^(IP,6)  be  the  probability  of  finding  the  target  in  N  searches  when 
IP  is  the  initial  state  and  strategy  6  is  employed.  Then  by  definition 

*  A  A 

of  6  ,  6  , 

gm(IP,61)  =  inf  |gm"1(lP  ,6)  +  [1  -  f®‘1(lP  ,6)].CjJ  < 
g°"1(IP,62)  +  [1  -  f^OP  ,62)J*  max  c± 

gm(IP,62)  =  gm-1(IP,62)  +  [1  -  fm'1(lP  ,62)]*  min  c±  . 


Hence 


0  <  g,n(3P>61)  -  gm(IP,62)  <  [1  -  fm_1(P  ,  62)  ]  •  [max  c±  ~  min  Ci]  . 

By  Lemma  4.2,  g(!P  )  is  bounded.  Hence  gm(!3P  )  =  gm(]P  ,62)  is 

ni""l  2 

bounded.  If,  as  m  -*■  «  ,  f  (IP  ,6  )  /•  1  ,  then  there  is  a  finite 

probability  that  the  target  will  never  be  found.  Since  min  c.  >  0  , 

i  1 

this  would  imply  gm(IP,62)  becomes  unbounded,  which  is  a  contradiction. 
Therefore 


gm(]Pt61)  +  gm(IP)  and 
g°(F  )  -*■  g(IP)  by  Theorem  4.2. 


Q.E.D. 


4.3  Random  Overlook  Probabilities  Told  before  the  Search 

In  this  section,  the  case  where  the  ot’s  are  told  before  the 
search  will  be  analyzed.  Thus  let  be  the  probability  of  finding 
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the  target  when  a  search  is  made  in  box  i  at  time  t  ,  given  that  the 

1  2 

target  is  in  box  i  .  Again,  for  any  fixed  i  ,  ...  are 

independent  identically  distributed  random  variables.  The  fact  that 

o^'s  are  told  before  time  t  makes  it  necessary  to  include  the  ajr’s 

as  part  of  the  state  at  time  t  .  That  is,  the  state  at  time  t  now 

consists  of  the  posterior  probability  of  the  target  being  in  box  i  at 

time  t  ,  (call  it  pb  as  well  as  the  a^'s  . 

To  simplify  the  discussion,  two  types  of  a^’s  are  chosen.  One 

t  t  t 

is  the  case  where  for  any  fixed  t  ,  °n  are  assumed  to  be 

independent  random  variables.  The  other  is  the  case  where  , 

i.e.,  at  any  time  t  ,  a^'s  are  identical  for  all  the  boxes. 

Consider  first  the  case  where  for  fixed  t  ,  ...»  a*  are 

independent  random  variables.  In  the  problem  of  maximizing  the  probability 

of  finding  the  target  in  a  given  number  of  searches,  one  might  conjecture 

that  an  optimal  strategy  searches  the  box  with  max  a.p  each  time.  The 

i  11 

following  counterexample  shows  that  this  is  not  always  true. 

Suppose  in  a  two  box  optimal  search  problem,  the  objective  is  to 
maximize  the  probability  of  finding  the  target  in  two  stages.  Let  the 
prior  probabilities  at  the  first  stage  be  p^  ,  p0  .  For  any  t  ,  let 
al  *  a2  ”ave  t*le  following  probability  distribution. 

| 1  with  probability  a 

)  0  with  probability  1  -  a 

( 1  with  probability  b 


\0  with  probability  1  -  b 


Assume  that  at  the  beginning  of  the  first  stage,  and  a ^  are 

told  to  be  ot^  =  =  1  ,  while  p1  >  P2  >  ®  •  Then  according  to  the 

conjecture,  an  optimal  strategy  searches  box  1  first.  Moreover,  since 
=  1  initially,  if  the  ball  is  not  found  at  the  first  search,  then 
the  ball  is  not  in  box  1  and  one  always  searches  box  2  at  the  next  stage. 
Let  the  a's  for  the  second  stage  be  and  and  consider  the 

following  two  strategies.  One  is  to  search  first  box  1  then  box  2, 
call  this  strategy  .  The  other  is  to  search  first  box  2  then 

box  1,  call  it  .  The  probability  of  finding  the  target  by  using 

the  first  strategy  is  p^  +  p2Ea2  *  The  same  Pr°hability  by  using 
the  second  strategy  is  p 2  +  *  Clearly,  if  p2(l  -  Eap  > 

p^(l  -  Eap  ,  then  S^S2  is  not  optimal  and  the  conjecture  is  wrong. 

It  is  easy  to  see  that  for  some  suitably  chosen  a  and  b  in  the 
probability  distribution  of  a  ,  namely,  for  p2(l  -  b)  >  p^(l  -  a)  , 
the  counterexample  is  established. 

The  following  definitions  will  be  used  in  the  theorems  that  come 
later. 

Let  (IP  ,a)  be  the  state  at  a  certain  stage,  where 

T-  (p^»P2>  •••»  P  )  is  the  posterior  probability  vector  at  that  stage 

and  a  =  (a^>a2’  **•»  °n)  is  t^ie  \'s  at  t^at  sta8e*  The  superscript 
£ 

t  in  a  is  suppressed  since  the  time  is  understood  to  be  that  specified 
stage. 

A  strategy  6  is  defined  to  be  any  rule  for  determining  which  box 
to  search  at  a  given  state.  Since  the  prior  probabilities  are  given,  a 
strategy  <5  yields  a  searching  sequence  which  depends  on  the  ot^'s 
which  are  told  each  time  before  the  search. 

For  any  strategy  6 ,  any  integer  m  and  any  initial  state  (IP  ,0)  , 
let  fm(IP  ,ci; 6)  =  the  probability  of  finding  the  target  in  m  searches 
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when  (IP  ,a)  is  the  state  at  the  first  stage  and  strategy  6  is 
employed  thereafter.  “ 

Let  fm(F  ,o)  =  inf  f°(lP,a;6)  .  For  any  i  ,  let  f®(6)  =  the 
6  1 
conditional  probability  of  finding  the  target  in  m  searches  given  that 

the  target  is  in  box  i  and  strategy  6  is  employed. 

n 

Then  Efm(IP  ,a;6)  =  £  p  f®(5)  . 

i=l  1  1 

Theorem  4.4: 

Let  box  i  be  any  box,  (P  ,o)  be  any  state.  Let 
o'  *  (a^,a',  •••»  *')  be  another  a  vector  such  that 

«i  >  «i  .  «i  <  “k  v  k  *  1  • 

Suppose  an  optimal  strategy  first  searches  box  i  at  state  (P  ,a)  . 

Then  an  optimal  strategy  first  searches  box  i  at  state  (IP, o')  as  well. 

Proof: 

For  a  given  initial  state  (P  ,a)  ,  and  for  any  j  ,  leu  T^IP 
be  the  posterior  probability  vector  for  the  next  stage  given  tha,".  the 
present  search  of  box  j  has  not  uncovered  the  target.  Thus 

T.W  =  [(T.P)1,  ....  (T..lP)n]  ,  where 

(pr(1  "  <r  *  3) 

(<l  "  <Vpj*1  ~  aipj)_1  =  J)  • 

Let  box  i  be  the  box  specified  above. 

■k 

For  the  initial  state  (P  ,a)  ,  let  S^6  be  the  strategy  of  first 

* 

searching  box  i  then  following  an  optimal  strategy  <5  .  Let  box  k 


be  any  box.  For  the  initial  state  (IP  ,cr)  ,  let  be  the  strategy 

of  first  searching  box  k  ,  k  ^  i  then  following  an  optimal  strategy 
6 '  .  Let  a  be  the  a  vector  for  the  next  stage.  Then 

f“(P  ,«)  =  f“(p  ,a;S.6*j  =  a.p.  +  (1  -  ,a2)  = 

a.p  +  (1  -  a.p.)  I  (T  P)  f®-1(6*)  = 

j=l  J  J 

aiPi  +  ^  "  OP^i”3^6*)  +  l  P1f^~1((5*)  = 

11  xxx  j#i  J  J 


vif1  -  fr1<s‘>] +  jx  vrV)  • 


Since  a!  >  a.  ,  a,*  <  a. 

x  —  x  ’  k  —  k 


4"(p  •a’isis*)  -  -  £r1(s*>] +  i 

f“(lP  ,a,Sio*)  =  f“(F  ,a)  >  fm(P  .a.S^')  = 

Vk[i  -  *rx<**>] +  "i  ^r1<s’>  i 

-  «r1H +  j,  ^r1(s,)  ■ 

£m(IP  ,a';SkS')  =  aj.pk  +  (1  -  ,c.Z)  . 

Hence  an  optimal  strategy  first  searches  box  i  at  state  (P  ,a')  . 

Q.E.D. 

Assume  searching  any  box  i  costs  c^  >  0  .  Now  consider  the 
problem  of  minimizing  the  expected  searching  cost.  For  any  strategy  6 
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any  initial  state  (IP  ,a)  ,  let  g(IP  ,a;6)  =  the  expected  searching 
cost  when  (]P  ,a)  is  the  state  at  the  first  stage  and  strategy  6 
is  employed  thereafter. 

Let  g(IP  ,a)  =  inf  g(lP  ,ct;5)  .  For  any  box  i  ,  let  g.(6)  =  the 

<5  1 

conditional  expected  searching  cost  given  that  the  target  is  in  box  i 

and  strategy  6  is  employed. 

n 

Then  Eg(3P  ,<x;6)  =  £  p,g.(6)  . 

i=l  1  1 

Theorem  4.5: 

Let  box  i  be  any  box,  (IP  ,a)  be  any  state.  Let 
a’  =  •••>  a^)  another  a  vector  such  that 

“i  -  “l  ’  °k  -  “k  ¥l'i' 

Suppose  an  optimal  strategy  first  searches  box  i  at  state  (P  ,a)  . 

Then  an  optimal  strategy  first  searches  box  i  at  state  (IP  ,a')  as  well. 

Proof: 

Let  T^I?  he  defined  as  before.  For  the  initial  state  (IP  ,a)  , 

* 

let  S..d  be  the  strategy  of  first  searching  box  i  then  following 

* 

an  optimal  strategy  6  .  For  the  initial  state  (F  ,a')  ,  let 

be  the  strategy  of  first  searching  oox  k  ,  k  j*  i  ,  then  following  an 

2 

optimal  strategy  d  .  Let  e  be  the  a  vector  for  the  next  stage. 


Then 
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8 OP  >“)  =  g(lP  ,a,S.6*j  »  c.  +  (1  -  oipi)Eg(TiIP  ,a2j  - 

n  * 

Ci  +  (1  ’  Vj)  ^  (T.P)  g  (5)« 

j*l  J  J 

c  +  (i  -  a  )p  g  (6  )  +  l  p  g  ({*)  = 

X  XJ.1  J  J 


ci  "  Vigi(<  > 


+  l  p.e.C5*) 

j=i  2  2 


Since  a’  >  a.  .a,1  <  a, 
i  —  l  k  —  k 


g(P,a’;S,6  j  »  c,  -  ci'p  g  (6  )  +  l  p  s  (6  )  £ 
\  -  1  A  -  j=l  J  J 

g(p  -  g(P  ,a)  <  g(P  ,a;Sk6’)  = 


c 


k 


Vk8k(5,) 


+  l  P-jg^5')  5  g(lp  ;sk6’>  • 
j=l  J  J 


Hence  an  optima]  strategy  first  searches  box  i  at  state  (TP  ,a’)  as  well. 

Q.E.D. 

Consider  the  case  where  for  fixed  time  t  ,  =  aC  ,  i.e.,  all 

the  boxes  have  the  same  overlook  probabilities.  In  the  problem  of 

maximizing  the  probability  of  finding  the  target  in  a  given  number  of 

searches,  one  might,  conjecture  again  that  an  optimal  strategy  first 

searches  the  box  with  max  p.  .  The  conjecture  is  not  always  true. 

i 

Another  conjecture  is  that  if  1P=  (p^  ...,  p^)  ,  a  is  given  and  it  is 
optimal  to  first  search  box  i  at  state  (P  ,a)  then  it  is  also  optimal 
to  first  search  box  i  at  state  (IP  ',3)  where  P  '  =  (p|,  ...,  p^)  , 
p|  Pj;  >  Pj'.  i  1  k  5s  i  .  Note  that  the  first  conjec  ture  implies 
the  second  one.  A  counterexample  is  given  below  to  show  that  even  the 
second  conjecture  is  not  always  true. 


mum 


Let  there  be  two  boxes.  Consider  a  two  stage  optimal  search 

problem.  The  objective  is  to  maximize  the  probability  of  finding  the 

t  t  t 

target  in  two  searches.  Notice  that  a^  *  =  a  in  this  case, 

1  2 

Suppose  a  *=  a  ,  a  =  a*  ,  the  prior  probability  vector  is  P  *  (p^.p^) 
For  the  last  stage  (the  second  stage) ,  obviously  the  box  with  the  larger 
probability  of  containing  the  target  ought  to  be  searched.  If  the 
second  conjecture  were  true,  then  there  would  be  a  number  A  >  0  such 


that  an  optimal  strategy  first  searches  box  1  when 


?2  " 


>  A  ,  and  first 


searches  box  2  when  —  <  A  .  This  will  be  disproved. 

P2  “ 

Assume  a  <  Ea'  and  assume  first  (1  -  a)p^  p^  .  The  probability 

of  finding  the  target  by  searching  first  box  1  then  the  box  with  larger 
posterior  probability  of  containing  the  target  is 


(*) 


j(l  -  a)p  p 

api +  a '  -Pi)  •<■*'>• 

*=  ap  +  (Ea')*  max  {(1  -  o)p  ,p  }  = 

1  1,2  ± 

ap1  +  (Ea') • (1  -  a)p1 


since  (1  -  a)p^  >_  ,  by  assumption.  The  probability  defined  the 

same  way  as  above  except  that  box  2  is  searched  first  is 


(**) 


ap  +  (Ea*) •  max  {p_ ,  (1  -  a)p,}  = 

1,2 

ap,  +  (Ea')^ 


>_  p  >  (1  -  a)p  .  Subtracting  (**)  from  (*)  yields 


since  p,  >.  (1  -  a)p 


So  if  p^l  -  a)  _>  ,  an  optimal  strategy  first  searches  box  1  or 

box  2  depending  on  p^(l  -  Ea*)  -  p2  is  positive  or  negative. 

By  symmetry,  if  p2(l  -  <0  j>  P^  ,  an  optimal  strategy  first  searches 
box  2  or  box  1  depending  on  p2(l  -  Ea')  -  p^  is  positive  or  negative. 

It  follows  that  under  the  assumption  that  a  <  Ea'  ,  the  optimal  strategy 


searches  box  1  when 
searches  box  2  when 


1  -  a  _> 
or  when 


1  - 

1  - 


Ea’ 

Ea* 


and 


Hence  the  second  conjecture  was  wrong. 


! 
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